Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

eptides as prototypes for inhibitor design.

bio-basis function space was supported by the cleaved peptides

ctor Xa protease data. Afterwards, the party package was used

te a random forest model. In such a tree, all the cleaved peptides

he bio-basis functions were ranked. Some peptides were ranked

and some were not. Figure 3.51(a) shows such a tree, in which

seen that the peptide ULSRU was ranked the top, i.e., the most

t cleaved peptide. The smallest p value of this peptide indicates

cance as the one which was most close the prototype. The next

t peptides were LQFRU and UWWRU. Figure 3.51(b) shows the

ve of this model, where the Dayhoff matrix was used. It must be

at this unique feature may not be feasible in other machine

algorithms.

(a) (b)

(a) One of the trees generated by the model (the bio-random forest model)

for the factor Xa protease cleavage data by the party package for ranking

ptides. (b) The ROC curve of this model. The AUC was 0.956.

pter has introduced various discriminant analysis algorithms for

cleavage pattern discovery. Both linear and nonlinear

ant analysis algorithms have been introduced in this chapter.